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ABSTRACT 

Isotonic  estimation  involves  the  estimator  of  a function  which  is  known  to 
be  increasing  with  respect  to  a specified  partial  order.  For  the  case  of  a linear 
order,  a general  theorem  is  given  which  simplifies  and  extends  the  techniques  of 
Prakasa  Rao  (1966)  and  Brunk  (1970) . Sufficient  conditions  for  a specified  limit 
distribution  to  obtain  are  expressed  in  terms  of  a local  condition  and  a global 
condition.  The  theorem  is  applied  to  several  examples.  The  first  example  is 
estimation  of  a monotone  function  u on  [0,1]  based  on  observations  (i/n,Xj  j), 
where  EXni  = u(i/n) . In  the  second  example,  i/n  is  replaced  by  random  Tni . 
Robust  estimators  for  this  problem  are  described.  Estimation  of  a monotone 
density  function  is  also  discussed.  It  is  shown  that  the  rate  of  convergence 
depends  on  the  order  of  first  non-zero  derivative  and  that  this  result  can  obtain 
even  if  the  function  is  not  monotone  over  its  entire  domain. 
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SIGNIFICANCE  AND  EXPLANATION 


In  many  experiments  one  would  expect  that  an  increase  in  "input"  will  produce 
an  increase  in  "output".  For  instance,  the  greater  the  vitamin  concentration,  the 
faster  the  growth  of  organisms;  the  greater  the  force  applied  to  a rod,  the  greater 
the  elongation.  However,  due  to  random  effects,  experimental  results  may  not  show 
the  expected  monotonic  behavior. 

The  most  common  method  for  dealing  with  this  situation  is  to  use  curve-fitting 
(e.g.,  by  least-squares),  assuming  some  parametric  form  (e.g.,  polynomial  behavior). 
However  there  are  many  situations  where  this  is  not  appropriate. 

This  paper  discusses  the  estimation  of  functions  which  are  known  to  be  monotone, 
but  which  are  not  assumed  to  have  any  particular  parametric  form.  The  exact  distri- 
butions of  these  estimators  is  very  complicated,  but  some  limiting  distributions  are 
known.  The  paper  proves  a relatively  abstract  theorem,  which  can  then  be  used  to 
obtain  all  the  known  limiting  distributions,  and  to  extend  these  results.  Some  new 
estimators  are  also  described.  The  theorem  is  applied  to  estimation  of  monotone 
function  and  to  estimation  of  monotone  density  functions. 

The  results  are  applied  to  one  of  the  most  common  non-parametric  methods  for 
dealing  with  situations  where  failure  of  the  data  to  exhibit  the  expected  monoton i- 
city  is  regarded  as  a sampling  artifact,  namely  the  isotonized  mean:  two  neighbour- 
i no  observations  that  do  not  have  the  expected  monotonic  behavior  art'  replaced  by 
two  numbers,  each  equal  to  the  mean  of  the  observations.  This  procedure  is  repeated 
until  the  est imator  is  a monotone  function.  The  results  in  this  paper  show  that  the 
isotonized  mean  is  sensitive  to  extreme  values.  It  is  shown  the  linear  combinations 
of  order  •••tat  i sties  can  be  used  to  obtain  estimators  which  are  more  robust  than  the 
isotonized  mean. 
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Th<  responsibility  for  the  wording  and  views  expressed  in  this  descriptive  summary 
lie  with  MRC,  and  not  with  the  aut hoi  of  this  report. 
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1.  Introduction 

Suppose  for  each  of  n independent  variables  X(  there  is  a known  t.  such  that  the 

distribution  of  X is  believed  to  be  determined  by  and  to  vary  with  t..  Let  f denote 
i it. 

i 

the  cumulative  distribution  function  (CDF)  of  X^.  Let  0 ( • ) be  a specified  functional  on  a 
subspace  of  cumulative  distribution  functions.  0 induces  y,  a real-valued  function  on  the 
space  of  t's  by  y(t)  • OlF^).  y is  an  isotonic  function  if  there  is  a partial  order  on  the 
space  of  t's  such  that  whenever  t is  "greater  than  or  equal  to"  s,  y(t)  >_  y(s).  This 
paper  concentrates  on  the  case  in  which  the  t^  are  real  numbers  with  the  usual  ordering  and 
y an  isotonic  function  is  equivalent  to  y a non-deereasinq  function.  An  isotonic  (or  mono- 
tone) estimator  of  y will  be  an  estimator  which  always  has  the  known  monotonicity,  but  is  not 
restricted  to  a particular  functional  form.  Use  of  an  isotonic  estimator  is  appropriate  if  the 
order  relation  is  certain,  that  is,  if  the  failure  of  the  observations  to  exhibit  the  specified 
order  is  an  artifact  of  the  randomness  of  the  observations  dominating  the  unknown  underlying 
deterministic  increasing  function. 

The  least-squares  solution  to  this  problem  has  been  known  for  some  time.  Ayer,  et .al . 

(1955)  and  van  Eeden  (195L)  describe  an  estimator  p (t)  (the  isotonized  mean)  which  is  the 

n 

monotone  function  with  smallest  error  sum  of  squares  ( } (X.  - u(t  ,))*,  u nondecroasing)  . 

i-1  1 1 

M adaptively  pools  observations  until  the  qroup  moans  am  incroasinq.  Barlow,  ot .al . (1“  , 

Chapter  1)  discuss  several  algorithms  for  computation  of  this  estimator.  Wo  shall  use  the  favt 

that  u (s)  is  the  loft  hand  slope  of  t ho  greatest  convex  minorant  of  the  cumulative  sum  pro- 

j 

cess  of  the  x's  (1(1,  V X.),  0 * 1 * n))  . The  asymptotic  distribution  of  this  est  imatoi  w.v- 
t*i  4 

stated  by  Brunk  (1970).  This  (viper  shows  that  Brunk's  result  can  be  sharpened  and  extended 


through  the  use  of  a theorem  on  the  distribution  of  the  slopes  of  greatest  convex  minorant s 

Section 


□ □ 


ot  processes.  This  theorem  e*l>  also  be  used  to  extend  the  results  of  Prakasa  Kao  (lVe1*)  on 
estimation  of  monotone  densities,  as  well  as  to  obtain  asymptotic  distributions  of  other  esti- 
mators. The  general  theorem  is  stated  in  section  2,  applications  of  the  theorem  are  indicated 
in  section  J,  and  t lie  general  theorem  is  proved  in  section  4.  The  final  section  contains  a 
discussion  of  the  relationship  of  the  results  described  here  to  other  research. 


The  deneral  Theorem 

The  asymptotic  distribution  of  these  estimators  is  of  interest  because  the  finite  sample 
d\:  t i ibut  ions  ate  especially  complicated  in  all  but  the  very  simplest  vases.  However,  to  ob- 
tain  limiting  results,  it  is  necessary  to  specify  how  the  limits  are  obtained.  If,  foi  example, 
the  set  of  t's  tv'  which  X's  correspond  is  fixed  (and  hence  finite),  while  the  nurnbei  of 
\'s  v>bsei ved  at  each  t becomes  infinite,  and  if  the  mean  of  those  X's  cori eapondinq  to  a 
paiticular  t converges  to  ii(t)  and  u takes  on  a distinct  value  at  each  of  the  t's  fot 
which  observations  are  recorded,  then  these  means  ate  asymptotically  consistent,  asymptot leal t\ 
independent,  and  asymptot  ical  ly  normal  (if  rescalevl  in  the  usual  manner)  (See  Pa  i sons  (l'*7h), 
foi  furthei  discussion  of  this  case). 

This  paper  concentrates  tin  the  case  in  which  the  number  of  distinct  t's  at  which  obsei - 

vat  ions  aie  made  becomes  infinite.  Exact  conditions  on  the  t's  appear  below.  Meanwhile  we 

assume  that  foi  each  n we  observe  { (T  , x ),  i • l(l)n)  where  X . **  F We  can  assume 

ni  m n i T 

ni 

that  the  ob*.ei  vat  tons  are  indexed  so  that  T increases  (strictly)  with  i for  everv  n.  If 

ni 

the  T's  .u  e random,  X is  thus  the  concomitant  of  the  ith  ordet  statistic  of  the  T's. 
ni 

nice  the  isotonicity  of  the  underlying  function  u is  preserved  by  monotone  increasing  tunc 

t ions  of  r,  it  will  often  be  convcMiient  to  work  with  {(t  , X ) i - l(l)nl,  wheie 

n i n i 

t i n.  The  t aie  essentially  equally  spaced  in  the  unit  mteival.  It  \ \s  the  light 

ni  in  ~ it  n 

>nt  iivu'us  (empirical)  d i st  i ibut  ion  function  of  the  T , and  0 is  the  left  cont  inttvnis 

ni  n 

viuant  tie  function,  t K (T  ) and  T « o (t  ). 

ni  n ni  ni  Mi  ni 

v suggested  b\  the  \ sv>t  on  \ xed  mean,  we  may  wish  to  work  with  estimators  of  the  foim 

v ni*d  sloqcom(s)  (it,  r.^(t))  t i T ) whole  s 1 oqcom  ( s)  ( A ) is  the  left-hand  slope  at  s v't 

the  greatest  convex  mmotant  of  the  set  of  points  A,  T.  (t)  is  a random  continuous  piocess  and 

is  m interval  vMntainmg  s.  Theorem  l states  that  if  the  process  ? satisfies  two  con- 

1 n 

vlitiv'ns,  t lien  t hi'  asymptotic  behavioi  of  u (s'  is  known.  While  the  condit  ions  look  quite 

n 

comp  l i cat ed , they  can  be  described  intuitively  and  verified  in  practice.  Before  examining  the 
. onditions,  it  is  useful  tv'  note  that  t he  proof  uses  the  approximate  eat  imatoi  s 

is)  Mov>cv'm(s)  ( (t  , ?.  ( t ) ) , I t-a!  «•  .\'n  M.  u (a)  is  seen  to  be  a local  version  of  i«  (a). 


1 


The  first  .xinditlon  7,  is  that  the  increments  of  7.  stay  above  certain  lines  ovei  cet- 
n n 

tain  teuton*  with  auf  f lclent  ly  high  probability.  Remaik  that  these  lines  depend  on  n and  c, 
although  this  dependence  is  suppressed  in  some  of  the  notation.  Therefore  weak  convergence  of 
the  processes  will  not  imply  Condition  1. 

Condition  1 (Hitting  Timesl 

Ills  lim  V(J  (t)  - 7.  (a)  * L <t),  some  t « 1.1-0  i - 1(1' 4 

n n i i 

v'*«»  n — 

where  i (t)  is  a line  and  1^  an  interval. 

1.(0  - ( — .si  n T 


1 (t)  - |s  *Jcn  -1  n T 


1,(0  - (s.-i  n T 


1.(0  • ( — , s - Jen  *1  n T . 


The  first  two  lines  are  defined  as  follows: 


Lj(t)  - - t(n,c)  - (s-t)ii(n.c) 


Lj(t) 


t (n.c)  - (s  -t)n(n,cl , 
n 


where 


u (n ,c)  - u (s)  ♦ ( Jen  1 1 


1-p/Jp 


0 ( s) 


s - s ♦ Jen 
n 


t(n,ci  - co  - a«P-»/ap,a»-P»/ap  p<.>  n-<P*i>/2c<up)/jpt  0 , e « , 

for  s.sse  constants  n(sl  , p,  y(s\  and  t. 

L,  is  Obtained  by  using  the  formula  for  L,  with  n(n,e)  is  replaced  by 
u(a)  - (Jen"*’)'  * ' ( s 1 . 

is  obtained  from  L,  by  siakinu  the  same  substitution  and  replacing  by  s - Jen  1 . 


The  second  condition  is  that  a suitably  renormaliced  version  of  7.  converqe  to  a Wienet 
process  about  a conves  function.  This  condition  will  be  used  t o obtain  the  limiting  behavioi 


I 


of  u |sl  for  > fixed  and  n laiuo.  Thus  the  olobal  fust  condition  will  ho  used  t. 
no 

that  tho  local  behavioi  of  det  i>rm  i nos  t ho  .tsviupt.'i  n'  behavior  of  n . 

n n 


('out i Hon  (Local  Weak  I'onvtum'mc) 


n(r  (s  ♦ icn"’\)  - l".  (si)  - 3enl"'\  n(s)\ 

(t.  _JL ~~TT ) 

(2en  *)  o (*)  |t 


w 

— » 

n **• 


(t  , Wit) 


Vs' 

o (Sl 


(-V) 


' -l'|.  I On')  I 


i 


where  ms),  c(sl  and  o(s)  arc  constants  (o(s)  and  o(s)  posit  ive)i  W(t)  is  a 
two-sided  standard  Wienei  process  on  and  the  converoenoo  is  weak  convergence  on 

0 1 - 1 , ♦ 1 1 , 


Tins' i i'm  3,1 

It  foi  simso  constant  u(sl  and  positive  constants  els',  ,'(sl  and  p,  the  pioeesse: 


>at  i st  v Conditions  I and  alv>ve  and  u^is)  - sKwieom(s)  (it,  (til,  t i T 1 , thei 


b£ 

n * (n  (s)  - u (si  I 


ill 


(o(sl)l*,'(,-(sl)1 


r -4 


,()>' 


».*E 


where  x'*  - aloucom(O)  t (t  , W(tl  ♦ 1 1 | ]t|  \ >»!  and  W(tl  is  a siandai.l  Ku-nei  p\o 

,'ess  on  H with  WU'I  • 0, 


The  following  extension  of  Theorem  3.1  will  Is'  used  toi  Corollary  below.  It  will  not 

Is-  proved  explicitly,  but  follows  fiom  a routine  modification  of  Stop  i ot  the  pi  oof. 

Corollary 

Under  the  conditions  ot  Theorem  .'.1,  if  P • o (n  ')  and  u is'  - i.  isrp  I,  then  t he 

n p n n n 


conclusion  of  Theoi  em  .'.1  holds  tot  ii 


We  shall  see  bolow  that  the  case  p - 1/3  is  mos t common  in  applications. 


In  this  case. 


the  distribution  of 
Frakasa  Rao  ( 1969) , 
value  at  which  W(t) 
T has  a density  of 


X*1'  can  be  described  without  use  of  convex  minorants 

the  distribution  of  X*'''  is  that  of  T/2,  where  T 
2 

- t attains  its  maximum.  Chernoff  (1964,  Theorem  1, 
the  form  h(x)h(-x),  where  h is  a function  involving 


As  stated  by 
is  the  random 
p.  37)  proves  that 
partial  derivatives 


of  a particular  solution  of  the  heat  equation. 


Applications  of  the  Theorem 

This  section  consists  of  four  examples  of  the  application  of  Theorem  1.  We  shall  see  that 
Theorem  l can  be  used  to  reduce  the  derivation  of  the  asymptotic  distribution  of  an  estimator 
of  a monotone  function  to  the  verification  of  specific  conditions,  each  of  which  is  suited  to 
more  fundamental  probabilistic  approaches. 

Kx ample  1:  The  1 sot  on 1 zed  Mean,  Equally  spaced  deterministic  t's. 

Let  the  functional  0(*1  operate  on  the  space  of  cumulative  distribution  functions  with 
finite  expectations  b>  assiqnino  to  each  CDE  its  expectation.  The  induced  function  vi  satis- 
fies ntt)  - EX  . The  isotomzed  mean  n alluded  to  in  section  1 is  a natural  estimator. 

1 j H j 

Since  u (s>  * slogcom(s)  {{j,  ) X . ) , 0 j n!  * sloqcom(s)  l (j/n,  ) X /n)  , 0 <_  j/n  <.  1 ) , 

i-1  i«l 

define  Z^(t)  to  be  the  random  function  defined  by  linear  interpolation  between 

(it  , ^ X n)  0 * i < n).  (Recall  t . * j/n.)  Thus  2 (t)  is  a normalized  cumulative 

n j .*• . n i — — n j n 

i«i 

sum  pnvt'ss.  Theorem  1 will  be  applied  to  g ive  the  following  result: 


v’orol  Urv  l 

Assume  the  following  six  conditions  Are  mot: 

l»  X f i * l(l)n  are  mutually  independent,  for  each  n. 


EX  » ii(i/n)  . 
ni 

2 ■>  •> 

i.  Var  X iand  c‘  ' 01  and  (X  - u ( i nil',  1 v i < n,  n s 1 are  uniformly 

m ni  - - — 

nit  curable  . 


•I . o . s - 1 such  that  sup  u(t)  - u(s>  < inf  u(tl  for  some  $ > 0,  and  U is 

t^s-J  t'(sd 

increasinq  in  a neighborhood  of  s. 

5.  U has  an  Nth  order  derivative  at  s. 

(M) 

b.  N is  the  smallest  positive  (finite!  integer  with  W is)  x 0. 


Thet 


N (2N+1) 
n 


1/12N41) 

(U  (s)-u(s)) 
n 


, ( (2N+1) ) 

Cl  A 

> 

n *<• 


-7- 


Proof:  The  conditions  of  Theorem  1 are  checked  with  p • (2Ntl)  o(s)  * o, 
(N) 


p la) 


(s)./(  (N+l) !)  , and  Z the  normalized  cumulative  sum  process.  Denote 


W(n,c)  - u(s)  by  i (n,c) . For  notational  convenience,  we  iqnore  the  negligible  effect  of  ns 
failing  to  be  an  integer. 

We  sketch  the  verification  of  the  first  Hitting  Time  Condition.  (The  others  are  routine 

variations.)  In  this  example,  the  first  Hitting  Time  Condition  reduces  to 
ns 


c) 


lim  lim  P{  J X ,/n  < - t(n,c)  - ( s-k/n)  u (n  ,c)  , some  k <_  ns),  which  involves  q(n, 
c-*»  n*«"  i«k  nl 

ns  ns 

P(  l (X  -u(t  ))  > nt(n,c)  ♦ (ns-k)  t:  (n,c)  + T (u(t  )-g(s)),  some  k < ns),  the  prob- 
. , ni  ni  . , ni  — 

l-k  x*k 

ability  that  a cumulative  sum  process  crosses  a line,  where  the  sequence  of  cumulative  sums 

depends  on  n and  the  line  depends  on  n and  on  c.  Since  the  fourth  assumption  of  the 

corollary  implies  that  u(t  . ) * E(X  . ) < u(s)  for  i < ns,  q(n,c)  < 

ni  ni  — — — 

ns 

P{  T (X  . -p(t  .))  > nt(n,c)  ♦ (ns-k) e (n,c) , some  n < ns).  This  last  expression  can  be 
, nx  ni  — 

1-k 

written  as  P(S  . > nt(n,c)  > 4t(n,c),  some  0 * £ v ns),  where  S . is  the  tth  cumulative 
ni  — — nt 

sum  of  n independent  random  variables  with  variance  o~.  Using  the  Dubins-Savaqe  inequality 

(see  Dubins  & Savaqe  (1965)  or  Dubins  & Freedman  (1965))  or  the  Hajek-Renvi  Inequality  applied 

2 2 

to  the  submartingales  sn('  0 £ t £ n)  with  constants  - i(n,c)/(g  nt(n,c)k)  (see  chow, 

Robbins,  and  Siegmund  (1970),  p.  25),  it  can  be  shown  that  P{S  ■*  nt(n,c)  ♦ Sr(n,c),  some 

nt 

0 < l ns)  ( 1 ♦ c (n,c)  nt  (n,c) /o*)  * * (1  + c +*?N(2N-11 (s) t/o*)  This  implies 

lim  lim  q(n,c)  « 0,  as  desired. 


After  suitable  changes  of  notation,  the  weak  convergence  condition  is  seen  to  requite  the 
weak  convergence  of 


(2) 


(.  t . r (Xn,ns+i  tl(  n ^ r (,i(s«i/n)  - n(s)l^ 

\'k  (ni  # Z.  1/2  Z.  ” l "' 

' i*l  (k(n))  o i*l  o(k(n))  ‘ / (KU|^k(n) 


-8- 


whore  k(n)  =■  2cnl  P.  Assumption  3 ensures  that  the  Lindeberq  Condition  holds  for 
IIXr.,nsn  ' ^ ( (ns*i>  /r\)  ) , 1 < i < Mn)},  which  in  turn  implies  that  the  first  component  of  (.■) 
converges  weakly  to  Brownian  motion  (see  Billingsley  (1968),  p.  77,  problem  10.1).  The  s,  . i 
component  is  deterministic  and  converqes  uniformly  to  p ( s) tN+ 1 (2c) N+ ^^2/a , where 

D(s)  “ u (s) / (N+l) ! . Therefore  the  local  weak  convergence  condition  is  satisfied,  and  thi 
proof  of  Corollary  1 is  complete. 

□ 

Example  2:  Isotonized  Mean;  Random  t's 

Let  0(.)  and  u be  as  in  Example  1.  Corollary  2 shows  that  the  assumption  of  equally 

spaced  deterministic  t's  can  be  relaxed.  In  what  follows,  Fn  will  be  the  usual  right- 
continuous  function  and  ^ will  be  the  left-continuous  empirical  quantile  function.  Note  that 
tni  “ f,niTni)  and  Tni  = ^n^ni’  if  there  are  no  tied  T^.'s. 

Corollary  2. 

Assume  that  conditions  3 through  6 of  Corollary  1 and  the  following  three  conditions  are 

mot: 

l"  'Tni'  1 - 1 - n:  are  the  order  statistics  of  a sample  of  size  n from  a distribution 
f-  which  possesses  a positive  derivative  f(s)  at  s. 

MX^)  u^ni^  anc*  ^ni  " ^ .1  * 1.  is  a set  of  mutually  independent 

random  variables  for  each  n. 

3.  For  every  n,  { (Xpi  - utT,.)).  1 < i < n}  and  {T,.,  1 < i < n}  are  independent  sets 
of  random  variables. 

Then 


N/(2N+1) 


(N+l) if (s) 


(N) , . 2N 
m (s)o 


1/(2N+1) 


(hn(s)  - p ( s)  ) -►  X 


((2N+1)'1) 


where  U„(s)  is  the  isotonized  mean  based  on  {(T^,  Xp . ) , l<i<n,  n>ll  evaluated  at  s. 

Note  that  the  limitinq  distribution  is  exactly  the  same  as  in  Corollary  1,  except  for  the 
presence  of  f(s)  in  the  normalizing  constant. 


Proof:  Let  p,  o(s),  and  Z be  as  in  the  proof  of  Corollary  1,  and  let  p(s)  = 

n 

( N) 

U (s) /(f (s) (N+l)  ! ) . Recall  that  \i  (s)  = \i  (F  (s)  ) , where  u is  the  isotonized  mean  based 

n n n n 

on  { (t  . , X . ) , 1 < i < n} . Note  that  d = F (s)  = 0 (n  1^2)  = o (n  p) , since  p < 1/2. 
ni  ni  — — n n p p 

Since  the  conditions  can  be  verified  ror  s or  for  s+D  , we  choose  the  more  convenient  form 

n 

for  each  condition.  By  substitution,  the  Hitting  Time  Condition  involves  probabilities  q(n,c) 

which  are  obtained  from  those  of  the  first  corollary  by  substituting  T . for  t . and 

ni  ni 

nF  (s)  for  ns.  Since  the  T ,'s  are  increasing  in  i,  EX  . = p(T  .)  < p ( T _,..,)  = 

n ni  ni  n,i  n,nF  (s)+l 

n 

p(0  (F  (s)+l/n))  < p(s)  for  i < nF  (s) , and  Hitting  Time  Condition  follows  from  the  argument 
n n — n 

for  Corollary  1. 

The  weak  convergence  condition  involves  the  convergence  of  the  process  defined  by  linear 
interpolation  between  the  points  of 


(3) 


l r (Xn,nF(s)+i  ~ tJ(Tn,nF(s)+j))  r (P (Tn,nF(s) ti1  ~ M<5)) 
k(n)  i=l  (k(n))1/2o  i=l  c(k(n) ) 1/2 


0<i<k(n) 


The  first  component  converges,  as  before,  and  is  independent  of  the  second  component,  which  is 

now  a random  process.  The  conditions  of  the  corollary  imply  that  the  second  component  converges 

(N)  N+l 

weakly  to  the  nonrandom  process  (p  (s) /(f (s) (N+l) ! ) t /a.  A proof  of  this  fact  can  be 

based  on  the  observation  that  since  p is  monotone,  {p(T  .),  1 < j < n}  is  the  set  of  order 

n]  — — 

statistics  of  a sample  from  the  CDF  Fop  *.  Therefore  the  second  component  requires  a local 
weak  convergence  of  cumulative  sums  of  spacings.  If  Fop  ^ is  an  exponential  distribution 
the  fact  that  the  spacings  are  independent  exponentials  can  be  used  to  construct  an  embedding 
in  a Brownian  motion  from  which  the  result  follows.  The  differentiability  of  F at  s is  used 
for  a Taylor  series  expansion. 


Example  3 : Smoothly  weighted  linear  combinations  of  order  statistics;  equally  spaced  observa- 
tions 

Let  J be  a smooth  (see  below)  weight  function  defined  on  (0,1)  with  /j(u)du  = 1. 

0(F)  is  defined  to  be  the  solution  of  /j(u) Q(u-0 (F) )du  = 0 (where  Q = F *)  for  all 


-10- 


continuous  F such  that  the  integral  is  well-defined.  For  all  F members  of  a specific 


« 


translation  family,  0(F)  is  a percentile  of  F.  Which  percentile  0(F)  gives  depends  on 
the  weight  function  and  on  the  shape  of  F.  For  example,  if  J is  symmetric  about  1/2  and 
the  distribution  determined  by  F is  symmetric,  0(F)  is  the  median  of  F.  The  weight,  func- 
tion d can  be  used  to  construct  the  following  process  from  which  a slogcom  estimator  will  be 
obt ained . 

hot  (\  , l • i k)  . . denote  the  jth  order  statistic  of  the  set  X.  . Tt^en 

i(1)  1 k 

for  s m (0,1)  f ixcvI  and  C positive,  define 

(s4C  n)  - ) J()/(bl))lX  . .,  I • i < tl,../n  and 

n [ ^ n<ns)4i  - — (1) 

$ 

7.  (s-v‘  w)  - \ J(1/(<-rl)){X  ,,  1 < i < where  <ns>  is  the  least  integer 

n n<ns)-i-tl  - — (3) 

great e»  than  or  egual  to  ns.  Z can  l>o  thought  of  as  the  cumulative  sum  process  of  Corol- 

{.11  u I and  2 centered  at  s 7 It)-?,  fs) ) with  each  sum  of  random  variables  replaced  by 

n n 

the  J-woightod  sum  of  the  order  statistics  of  the  same  set  of  random  variables.  Extend  Zn 
to  a continuous  process  on  (0,1)  by  lineai  interpolation  and  define 

11  (s)  slogcom(s)  { ( t , Z (t)),  0 «.  t « 1).  For  any  finite  set  of  integers  A,  define  N ( A) 
n n — — 

to  be  the  numbei  of  elements  of  A.  Let  J (A)  denote  } J ( 1/ (N (A)  +1) ) l X i < A)  /11. 

n . n , 1 ( 1 ) 

1*  A 

Then  w <*:)  can  be  wr  itten  as  (N(I.*)  max  J (1.)  t N(U*)  min  J (U))./N(L*  U U*)  , where  the 
n n ti 

maximum  is  taken  over  the  sets  1.  of  the  form  {\  «.  1 < (ns))  for  some  i,  L*  is  the 
largest  such  set  foi  which  the  maximum  is  attained,  the  minimum  is  over  sets  U of  the  form 
<ns>  ♦ 1 1 k),  and  l!#  is  the  largest  such  set  for  which  the  minimum  is  obtained.  Note 

that  l.*  atid  U*  ai  e disjoint. 


iVrol  Ini  y 1 . 

Tf  the  following  six  assumptions  are  met,  then 


_N 
7N>  1 


(wti)  i 1 

2N  (5)  ’ ] 

1 U <s) 


/nVi 


ll  (k)  1 


,((-'N«l)'1) 
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1.  X . - w(  ) are  independent  , ident  ica  M v d i st  ti  but  ed  random  v.ir  with  < umul.it  iv< 

n l n 

dint  i ibut  i oi)  funct  ion  F. 

2.  J J(u)Q(u)du  * 0,  j J(u)du  » 1*  and  |i  is  non-decreasing  on  (0,11. 

d is  continuously  di  f feront  table  non -*n  eg  at  i vn  function  whoso  o'  -i.it  isf  ws  a 

Holder  condition  for  some  y,  - » y • 1.  The  suppoi t of  d is  a compact  subset 

of  (0,1) . 

4.  o*  - //  J (F(x) ) J(P(y) ) F(min(x,y) ) ( 1 - F(max (x,y) )dxdy  N 0. 

r'.  u has  an  Nth  order  derivative  at  s,  0 < s « 1,  where  N is  the  smallest  (finite! 

(N) 

integer  with  v>  ( s)  > 0* 

h.  F is  absolutely  continuous#  with  strictly  positive  density  f such  that  t eonveio. 

to  zero  at  infinity  and  f'  is  bounded. 

The  first  two  conditions  describe  the  model  and  assert  that  the  weight  funct  ion  J i 
appropriate.  The  third  condition  (which  includes  a requirement  that  J t \ tm'  is  used  to  veiify 
the  Local  Weak  Convergence.  The  non-nooat i vi f y of  j will  be  used  in  the  proof  of  the  Hitting 
Time  Condition.  The  fourth  condition  is  more  a definition  than  a condit  ion,  since  the  thud 
condition  ensures  the  integral  is  finite.  The  fifth  condition  describes  the  local  behavioi  of 
Vi  at  s.  The  sixth  condition  is  a regularity  condition  used  t o obtain  the  Corni sh-Fi she!  ex- 
pansion needed  to  compute  the  drift  component  of  the  local  weak  conveigenoe. 

Corollary  1 shows  that  if  the  X are  all  membet  s of  the  translation  family  oenriated 

ni 

by  the  CDF  F,  the  relative  efficiency  of  two  diffeient  i sot  on  i zed  lineal  comb  mat  von  •=  of  oidoi 
statistics  with  weight  functions  d and  J is  determined  by  the  ratio  0(d.,F)  o(d.,F), 
where  o(d,F)  - //  d (u)  J ( v)  |min  (u , v)  -u*  vl  dQ(u)  do  ( v)  . Corollaiy  1 shows  that  if  F has  finite 
variance,  the  same  formula  gives  the  efficiency  of  an  i sot  on i zed  lineal  combinat ion  ot  ordei 
statistics  relative  to  the  isotonixed  mean,  although  the  weight  funct ion  of  the  mean  does  not 
satisfy  the  conditions  of  Corollary  1.  This  same  tatio  is  the  asymptotic  relative  efficient 
of  two  linear  combinations  of  ordei  statistics  fot  estimating  the  locat ion  paiametei  of  in- 
dependent, identically  d i st  r ibuted  random  va»  iables  wh«>se  distribution  is  a membet  of  f hr 
locat ion  family  generated  by  F.  Therefore,  all  the  comparisons  known  fot  i simpli  locat  ion 


problem  carry  over  to  isotonic  estimation.  In  particular,  if  F does  not  have  a variance, 
Corollary  3 applies,  and  the  isotonized  version  of  any  linear  combination  of  order  statistics 
which  trims  will  converqe  in  the  familiar  manner.  However,  Corollary  1 does  not  apply.  This 
extreme  case  shows  that  the  isotonized  mean  is  sensitive  to  wild  observations  and  isotonized 
trimmed  linear  combinations  of  order  statistics  are  more  robust  to  heavy  tails. 

The  proof  of  Corollary  3 can  be  described  as  showinq  that  linear  combinations  of  order 
statistics  are  similar  enouqh  to  sums  of  independent  random  variables.  For  the  details,  sec 
Leurqans  (1978),  Chapters  IV  and  V. 

The  considerations  of  Example  2 imply  that  Corollary  3 can  be  extended  to  T ^ which  are 
either  other  tractable  deterministic  sequences  or  order  statistics  from  a suitable  distribution 

Example  4:  Estimation  of  a monotone  density 

Consider  the  estimation  of  a distribution  F with  support  on  the  positive  half-line  usinu 
a sample  of  n observations.  Grenander  (1956)  sugqested  an  estimator  for  the  case  in 
which  F has  monotone  (and  hence  decreasing)  density.  Grenander  proved  that  this  estimatoi 
is  the  restricted  maximum  likelihood  nonparametr ic  estimator  of  the  density  f.  C.renander's 
estimator  can  be  written  as 

f (x)  = - slogcom  (x){(t,  - F ( t ) ) , t > 0>  , 

n n — 

where  F is  the  empirical  distribution  function  of  the  observations.  Prakasa  Kao  (1969) 
n 

proves  the  following  result: 


Corollary  4 


Assume 

1.  f is  a decreasing  density  on  [0,"). 

2.  1 X X ) are  a sample  from  this  density. 

l n 

3.  f is  differentiable  at  s and  f(s)  <■  0. 


-1 


3- 


Th«»ii 


1 1 


f(») f ' (m) 


i /i 


(f  («)-f (*))  x 


To  obtain  Piakasa  Kao's  n*sult  from  Tlioot  cm  .’.l,  ust'  7. 


(t) 


(1/1) 


r (t)  , i>  ( m)  * -f  (a)  , 0(h)  - f (h)  , 
n 


P (*)  ■ f(s  )/2,  And  p • 1/1.  The  Hitting  Time  Condition  ('an  Ik»  obtained  from  Wald's  Lemma. 
The  local  weak  convergence  condition  involves  M (t),  the  empir ical  distribution  function  of 


(X  I 
in 


(X  • (S.K* 

ni 


. 1 • i • nt  ova  hint  o.l  at  m ♦ ion  *’t . Ell  (t)  is  qulto  traotablo  an.1 
ion  P)  ) “ n 


gives  the  cantering  needed.  A routine  finite  dimensional  distribution  and  tightness  proof 

1- > 1/2 

shows  that  n(H^(t)  - Rtl  (t))/(o(s)  (2cn  * ) ) converges  weakly  to  W(t). 

Notice  that  Theorem  ^.1  can  be  used  to  extend  Corollary  *>  in  several  directions.  The 

density  f may  have  first  derivative  zero.  It  some  higher  derivative  of  f is  negative, 

rates  and  limiting  distributions  based  on  the  order  of  this  derivative  ate  obtained. 

Furthermore,  f need  only  bo  locally  monotone  in  the  sense  of  Example  1.  Therefore  f can 

n 

be  consistent  even  if  the  density  function  f is  not  totally  monotone. 


•14- 


■ 


4.  Proof  of  Thaoram  2.1 

l-rt  Y danota  the  right-hand  wide  of  (1)  And  X the  •amt*  expression  with  u ra- 
il nc  nc 

pla»  ing  u^.  The  theorem  will  b«*  «*st  ahl i shed  in  the  following  staps: 


1.  X for  all  c,  X dw  X . (X  will  he  defined  below) 

n no  — * c c 

n *— 


x . x“”  . 

c *—■ 

J.  lim  1 im  l*{  u (s)  * h (s)  M*  1 1 ri  1 tin  Hx  * Y I)  * 1. 

nc  n ’ nc  n 

c •-*  n *"  c •>“  n “ 


Theorem  4.2  ot  Billingsley  (p.  2S)  , Y^  _d^  X,  which  implies  Theorem  2.1. 

. i f|  1.  Since  adding  .i  line  to  a function  inci  eases  the  slope  of  the  function's  convex  mi  nor  ant 
by  t Iw  slope  of  the  line 

n Is)  - p|s)  “ sloqcum(s)  ( (t , 7.  (t)  - Z (s)  - (t-s)u(s)),  1 1 — s | <_  2cn  *1 


ft .ins  1 at  i n>)  the  process  so  that  the  time  scale  of  the  function  whose  convex  minorant  is  being 
obtame.1  is  t-1,  * 11  amt  rescaling. 


l-p 

(H  - li(sl)  * sloqcom(O) 

it  ( s ) nc 


n ( 


( (t . 


" (s+2cn  *t)-Z  (s))-2cn*  ptu(s)\ 


1-  1/2 

(2cn  *’)  o(s) 


-Pl 


1) 


luce  shun  .as(O)  is  a continuous  functional  on  r(-l,tl|,  the  local  weak  convenience  condi- 
tion implies  the  expiession  above  converges  in  distribution  as  n » •»  to 

. logout  i 0)  t ( t , W ( t ) * l Jc)  * * * p ( s)  /.i  (s)  1 1 1 * ' * * ) , 1 1 1 * i 1 , which  can  be  shown  ( using 

s ale  ptopeities  of  the  Wiener  process)  to  e.|iial  (2cK(s))*  * X , where 

X sloqcom(O) ( (t ,W(t)  *■  |t|,l*,'>  *’)  , 1 1 | <_  2cK(s)  l and  K(s)  * (p (s) ) 2p/(o (s) ) *P . Dividing 

1/2 


the  last  display  by  (.VK(sl) 


we  see  that  for  fixed  c,  X 


converges  in  distribution 


Step  2 . The  only  difference  in  the  definitions  of  X and  i s that  in  X the 

points  is  restricted  to  1 1 | • 2cK(s)  . Therefore  to  show  X converges  in  d » st  i i hut  101 
it  is  necessary  to  show  t hat  larqe  values  of  t do  not  affect  the  convex  minor  ant  of 
W(t)  ♦ |t|^  1 * . Since  p • l implies  the  exponent  of  p is  positive,  the  proof  * » f 

step  follows  from  W(t)/t  al  ’ 0 (t  , as  is  pointed  out  by  Wright  (l‘i7N).  Pot  an  « x|  1 
proof  in  the  case  p - l/l,  see  prakasa  Kao  (l9f>9)  (irmnu  t»*2,  p.  14'  . 


1 1 1 1 »- 

— p — p -p 

s-2cn  s-cn  s sten  s+2cn 


Figure  1 . 

Step  3.  Fiqure  1 displays  a realization  of  the  process  7^  . At  s,  the  greatest  convr 

minorant  of  Z must  lie  entirely  below  I..  the  line  connect  iiui  7.  Is  ♦ cn  and 
n 1 1 n 

Z (s  - cn  * ) . Therefore  no  points  of  7.  above  this  line  can  affect  the 
n n 

sloqcom(s)  { ( t ,Z^(t ) ) t « It)  and  to  establish  this  last  step  of  the  proof  it  suffice*. 


• t of 

It.  X, 

this 
it  1 1 


*x 

to  »;h  \> 


-It,- 


that  ?n(t)  lies  above  Ll  for  all  t with  1 1 — s | > 2cn  *’  with  hiqh  probability,  Ll  ha- 

both  random  slope  and  random  intercept.  It  is  more  convenient  to  work  with  L2  and  L3,  also 

superimposed  on  the  diaqiam.  L3  is  the  line  through  (s  t cn  , Z^ ( s 4 cn  ))  with  non- 

random  slope  u(n,c)  and  L2  is  the  Hne  through  (s  - cn  Z (s  - cn  ^)  ) with  slope 

(l-p)/(2p>  " 

vi(s)  - (2cn  *)  c(s).  It  can  be  shown  that  if  2 is  above  L„  for 

n 2 

|t  - (s  - cn  ^) | > cn  1 and  above  L^  for  |t  - (s  ♦ cn  h | ' cn  1 (as  in  the  diaqram) , then 

2 lies  above  L.  for  |t-s|  > 2cn  * and  Y * X Therefore  it  suffices  to  show  that  the 

n 1 n nc 

conditions  of  the  probability  that  2^  lies  above  two  lines,  for  each  of  two  separate  intervals 
of  t,  is  one  in  the  appropriate  limit.  We  shall  show  that  the  Hittinq  Time  Conditions  with 
i » 1 and  the  Local  Weak  Convergence  Condition  imply  that  2^  lies  above  L3  foi  t <_  s with 
appropriately  hiqh  probability.  The  other  three  Hitting  Time  Conditions  are  used  in  the  -.ame 


manner,  and  then  the  Bonferroni  Inequality  can  be  used  to  complete  the  proof. 

Thus  it  remains  to  show  that  1 im  1 im  p(n,c)  - 1,  where  (rearranging) 

c-w»  n *** 

p(n,c)  » p(2n<s)  - L3(s>  1*3 < t)  - zn<f>  ♦ Zn<s)  “ L3(s),  t s).  The  probability  that 


2n(s)  - 1.3(b)  exceeds  l,3(t)  - Z^lt)  + Z^fs)  “ L3(s)  is  less  than  the  probability  that 
2^  (s)  - l,3(s)  is  greater  than  a fixed  constant  t(n,c)  and  that  this  same  fixed  constant  is 


greater  than  L3(t)  - Z^lt)  ♦ Z^ls)  - L3(s).  Applyinq  the  Bonferroni  Inequality  to  the  inter- 
section of  the  above  two  events,  and  recalling  the  definition  of  Lj(t)  in  the  Hittinq  Time 
Condition,  it  is  easy  to  show  that 

(4 ) p(n,c)  2.  P{Zn<s)  - 1.3 (s)  > t(n,c))  - Plz^Ct)  - Z^ls)  < Lj(t),  some  t <_  s) 

The  Hitting  Time  Condition  therefore  implies  the  lim  lim  of  the  last  term  (minus  sign  included) 


is  zero.  Using  the  Local  Weak  Convergence  Condition  with  t = -1/2  it  can  be  shown  (for  de 
tails,  see  Leurgans  (1U78),  chapter  3,  section  3)  that  the  lim  of  the  first  term  is 

( ’ii)  n 

1 - ♦ ((->!)  3c  * ’/2)  , where  0 < c < 1 (from  the  definition  of  t(n,c)>, 

X * (2^  * ' -1)  p (s) /(o  (s)  r^)  is  positive  (because  p < 1),  and  t is  the  cumulative  di- 

tribution  function  of  the  standard  normal  distribution.  Therefore  the  lim  lim  of  the  first 


term  in  (4)  is  1,  and  the  proof  of  the  theorem  is  complete. 


Remui  ks 


5. 

Example  1 is  a general  izat  ion  of  Brunk's  Theorem  5.2.  It  should  bo  remarked  that  Brunk's 
condition  that  "the  observations  satisfy  Lindeberq' s condition"  can  mislead  the  unwary:  from 
the  pr<x>f  of  example  1 we  see  that  t he  observations  must  satisfy  local  Lindeberq  conditions, 
which  are  uni  elated  to  a global  Lindeberq  Condition.  Wriqht's  paper  also  generalizes  Brunk's 
Theotem,  and  is  the  only  papei  known  to  the  author  with  results  for  N * 1 . Wriqht  does  not 
require  that  N be  an  integer  and  allows  a different  variance  structure,  but  otherwise  his 
results  correspond  to  Example  1 and  2. 

The  estimators  of  Examples  1 and  4 have  not  been  discussed  in  the  literature.  However, 

the  results  of  Robertson  and  Wright  (1975)  include  monotone  estimators  of  the  form 

u^ls)  15  max  minded,  i V)  , in  the  notation  of  Example  3.  Robertson  and  Wright  give  conditions 

under  which  their  minimax  estimators  are  consistent  for  p(s> , but  their  methods  do  not  give 

a rate  of  convergence.  Corollary  1 gives  such  rates  for  sloqcom  estimators  u . It  is  natural 

n 

to  conjecture  that  u and  u have  the  same  asymptotic  behavior,  even  though  u and  u 
n n n n 

are  identical  only  in  the  case  of  Example  1.  Unlike  u (s)  , ii  (s)  is  not  always  a monotone 

n n 

function  of  s.  Isotonized  jx'rcentiles  of  the  Robertson  and  Wright  type  are  also  discussed  by 
Casady  and  Cryer  (197b). 

Example  4 is  due  to  Prakasa  Rao  (I9b9),  and  was  the  first  asymptotic  distribution  obtained. 

Related  hazard  function  estimators  are  discussed  by  Prakasa  Rao  (1970),  Barlow  and  van  Zwet 

(1970)  and  Harlow,  et.al  (1972).  All  of  these  results  can  be  obtained  from  Theorem  2.1  and  can 

be  generalized  in  the  manner  described  for  Example  4. 

Recall  that  the  isotonized  mean  at  s (p  (s))  is  the  mean  of  the  X . 's  over  an  adapt i ve- 
il ni 

ly  chosen  ne lqhborhood  of  s.  Theorem  5.8  of  Barlow,  et.al  (1972)  and  Theorem  1.2  of  ivivis 
(19721  point  out  that  for  each  s,  if  slightly  wider  deterministic  windows  centered  at  s are 
used,  the  resulting  estimators  converge  more  rapidly.  However,  this  result  appears  to  bo  the 
same  sort  of  superefficiency  result  obtained  in  Example  1 for  N s l . In  the  case  of  Barlow, 
et.  al,  s must  lx*  at  the  center  of  every  window.  In  Example  1,  s must  be  exactly  a point 
at  which  h*(s)  * 0,  but  some  other  derivative  is  positive.  If  one  is  interested  in  estima- 
tion of  an  entire  function,  both  kinds  of  s are  isolated.  Also,  the  deterministic  window 


••st  iiMtors  need  not  qive  monotone  estimators  of  via). 


The  fact  that  u can  be  consistent  in  some  cases  even  when  u is  not  monotone  is  re- 
n 

mini  scent  of  Theorem  3.4  of  Barlow,  et.al  (1972),  which  states  that  likelihood  ratio  tests  that 
some  qroup  means  (normal  errors,  variances  known  and  equal)  exhibit  a specified  partial  order 
aqainst  the  null  hypthesis  that  the  means  are  all  equal  is  an  unbiased  test  of  some  alternatives 
which  do  not  have  the  specified  partial  order  aqainst  the  same  null  hypothesis.  The  applica- 
tion to  estimation  does  not  appear  to  have  been  noted  previously. 
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